Goto

Collaborating Authors

 Medford


AILS-NTUA at SemEval-2025 Task 4: Parameter-Efficient Unlearning for Large Language Models using Data Chunking

Premptis, Iraklis, Lymperaiou, Maria, Filandrianos, Giorgos, Mastromichalakis, Orfeas Menis, Voulodimos, Athanasios, Stamou, Giorgos

arXiv.org Artificial Intelligence

The Unlearning Sensitive Content from Large Language Models task aims to remove targeted datapoints from trained models while minimally affecting their general knowledge. In our work, we leverage parameter-efficient, gradient-based unlearning using low-rank (LoRA) adaptation and layer-focused fine-tuning. To further enhance unlearning effectiveness, we employ data chunking, splitting forget data into disjoint partitions and merging them with cyclically sampled retain samples at a pre-defined ratio. Our task-agnostic method achieves an outstanding forget-retain balance, ranking first on leaderboards and significantly outperforming baselines and competing systems.


Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders

Xin, Yuan, Li, Zheng, Yu, Ning, Chen, Dingfan, Fritz, Mario, Backes, Michael, Zhang, Yang

arXiv.org Artificial Intelligence

Despite being prevalent in the general field of Natural Language Processing (NLP), pre-trained language models inherently carry privacy and copyright concerns due to their nature of training on large-scale web-scraped data. In this paper, we pioneer a systematic exploration of such risks associated with pre-trained language encoders, specifically focusing on the membership leakage of pre-training data exposed through downstream models adapted from pre-trained language encoders-an aspect largely overlooked in existing literature. Our study encompasses comprehensive experiments across four types of pre-trained encoder architectures, three representative downstream tasks, and five benchmark datasets. Intriguingly, our evaluations reveal, for the first time, the existence of membership leakage even when only the black-box output of the downstream model is exposed, highlighting a privacy risk far greater than previously assumed. Alongside, we present in-depth analysis and insights toward guiding future researchers and practitioners in addressing the privacy considerations in developing pre-trained language models.


Investigating disaster response through social media data and the Susceptible-Infected-Recovered (SIR) model: A case study of 2020 Western U.S. wildfire season

Ma, Zihui, Li, Lingyao, Hemphill, Libby, Baecher, Gregory B., Yuan, Yubai

arXiv.org Artificial Intelligence

Effective disaster response is critical for affected communities. Responders and decision-makers would benefit from reliable, timely measures of the issues impacting their communities during a disaster, and social media offers a potentially rich data source. Social media can reflect public concerns and demands during a disaster, offering valuable insights for decision-makers to understand evolving situations and optimize resource allocation. We used Bidirectional Encoder Representations from Transformers (BERT) topic modeling to cluster topics from Twitter data. Then, we conducted a temporal-spatial analysis to examine the distribution of these topics across different regions during the 2020 western U.S. wildfire season. Our results show that Twitter users mainly focused on three topics:"health impact," "damage," and "evacuation." We used the Susceptible-Infected-Recovered (SIR) theory to explore the magnitude and velocity of topic diffusion on Twitter. The results displayed a clear relationship between topic trends and wildfire propagation patterns. The estimated parameters obtained from the SIR model in selected cities revealed that residents exhibited a high level of several concerns during the wildfire. Our study details how the SIR model and topic modeling using social media data can provide decision-makers with a quantitative approach to measure disaster response and support their decision-making processes.


Nils Nilsson, 86, Dies; Scientist Helped Robots Find Their Way

AITopics Custom Links

Nils J. Nilsson, a computer scientist who helped develop the first general-purpose robot and was a co-inventor of algorithms that made it possible for the machine to move about efficiently and perform simple tasks, died on Sunday at his home in Medford, Ore. His death was confirmed by his wife, Grace Abbott. Dr. Nilsson was a member of a small group of computer scientists and electrical engineers at the Stanford Research Institute (now known as SRI International) who pioneered technologies that have proliferated in modern life, whether in navigation software used in more than a billion smartphones or in such speech-control systems as Siri. The researchers had been recruited by Charles Rosen, a physicist at the institute, who had raised Pentagon funding in 1966 to design a robot that would be used as a platform for doing research in artificial intelligence. Although the project was intended to create a general-purpose mobile "automaton" and be a test bed for A.I. programs, Mr. Rosen had secured the funding by selling the idea to the Pentagon that the machine would be a mobile sentry for a military base.